Skip to content

gh-150228: Improve the PEP 829 batch processing APIs#150542

Merged
warsaw merged 21 commits into
python:mainfrom
warsaw:gh150228
Jun 2, 2026
Merged

gh-150228: Improve the PEP 829 batch processing APIs#150542
warsaw merged 21 commits into
python:mainfrom
warsaw:gh150228

Conversation

@warsaw
Copy link
Copy Markdown
Member

@warsaw warsaw commented May 28, 2026

As previously discussed with @ncoghlan and approved for 3.15b2 by @hugovk, this exposes the batch processing APIs for addsitedir() and friends. We remove the defer_processing_start_files flag which required some implicit module global state, and promote StartupState to the public documented API. This removes the need for module global implicit state and allows callers to control when accumulated .start and .pth file state is processed if they want.

This also fixes the interleaving regression identified by @ncoghlan in the same issue. Now, .pth file sys.path extensions are added to sys.path after the sitedir that the .pth file is found in, restoring the legacy behavior.

Along the way, I've made a lot of improvements to function docstrings, site.rst documentation, and comments in the code explaining what's going on.

As previously discussed with @ncoghlan and approved for 3.15b2 by @hugovk,
this exposes the batch processing APIs for addsitedir() and friends.  We
remove the `defer_processing_start_files` flag which required some implicit
module global state, and promote StartupState to the public documented API.
This removes the need for module global implicit state and allows callers to
control when accumulated .start and .pth file state is processed if they want.

This also fixes the interleaving regression identified by @ncoghlan in the
same issue.  Now, .pth file sys.path extensions are added to sys.path after
the sitedir that the .pth file is found in, restoring the legacy behavior.

Along the way, I've made a lot of improvements to function docstrings,
site.rst documentation, and comments in the code explaining what's going on.
@warsaw warsaw self-assigned this May 28, 2026
@warsaw warsaw requested a review from FFY00 as a code owner May 28, 2026 01:35
@warsaw warsaw added the 3.15 pre-release feature fixes, bugs and security fixes label May 28, 2026
@warsaw warsaw requested a review from AA-Turner as a code owner May 28, 2026 01:35
@warsaw warsaw added 3.16 new features, bugs and security fixes needs backport to 3.15 pre-release feature fixes, bugs and security fixes labels May 28, 2026
@warsaw warsaw requested review from hugovk and ncoghlan May 28, 2026 01:35
@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented May 28, 2026

Comment thread Misc/NEWS.d/next/Library/2026-05-27-11-18-36.gh-issue-150228.pNPiO-.rst Outdated
Comment thread Misc/NEWS.d/next/Library/2026-05-27-11-18-36.gh-issue-150228.pNPiO-.rst Outdated
warsaw and others added 2 commits May 27, 2026 22:51
…NPiO-.rst

Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
…NPiO-.rst

Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
@warsaw warsaw requested a review from hugovk May 28, 2026 05:52
warsaw added 4 commits May 29, 2026 12:22
* Add a note that if known_paths is provided to StartupState.__init__(), it
  will get mutated in place.
* Improve some conditional flows.
* Improve some comments.
* Improve the what's new entry.
Comment thread Lib/test/test_site.py
Comment thread Doc/library/site.rst
Comment thread Doc/library/site.rst
Comment thread Lib/site.py Outdated
Comment thread Doc/library/site.rst Outdated
Comment thread Doc/library/site.rst
Comment thread Lib/site.py
Comment thread Lib/site.py Outdated
Comment thread Doc/library/site.rst Outdated
Comment thread Lib/test/test_site.py Outdated
@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 29, 2026

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Copy link
Copy Markdown
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My approval is for the updated API design - much tidier without the implicit global state. Thanks @warsaw!

For the exact implementation and docs details, +1 to @gpshead's comments and questions (I don't have any strong opinions on how the open questions should be resolved, I just agree there are some details still to be tweaked for consistency)

warsaw added 3 commits May 31, 2026 10:21
* Add docs for site.makepath() and point the case-normalization requirement to
  this utility function.
* Note that StartupState.process() is not idempotent.
This time, we get rid of the legacy implementation `reset` local, which was
always difficult to understand, and just implement a return value based on the
processing mode selected.
@warsaw warsaw requested a review from gpshead May 31, 2026 18:25
@warsaw
Copy link
Copy Markdown
Member Author

warsaw commented May 31, 2026

@gpshead @ncoghlan thanks for the reviews! The PR is better because of them. I've pushed all the changes so now's a good time to take another look, otherwise I will merge it tomorrow, all things being green.

@warsaw
Copy link
Copy Markdown
Member Author

warsaw commented Jun 1, 2026

I have made the requested changes; please review again.

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented Jun 1, 2026

Thanks for making the requested changes!

@ncoghlan, @gpshead: please review the changes made to this pull request.

@bedevere-app bedevere-app Bot requested a review from ncoghlan June 1, 2026 00:30
Copy link
Copy Markdown
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks like a better API design!

I still see two red flags here though: an argument that doesn't combine with other arguments, and (another instance of) changing the return type based on an argument.

Did you consider adding a StartupState.addsitedir(sitedir) method, instead of the startup_state argument?

Comment thread Doc/library/site.rst
Comment thread Doc/library/site.rst Outdated
@warsaw
Copy link
Copy Markdown
Member Author

warsaw commented Jun 1, 2026

I still see two red flags here though: an argument that doesn't combine with other arguments, and (another instance of) changing the return type based on an argument.

Sadly, we have to keep these warts to cater to the legacy APIs. We could potentially deprecate them, but I don't really see much value in that, other than API hygiene, and it's probably not worth the backward incompatibilities.

Did you consider adding a StartupState.addsitedir(sitedir) method, instead of the startup_state argument?

I didn't, but that actually might be pretty cool.

state = site.StartupState(known_paths)
state.addsitedir()
state.process()

Then we only have to keep site.addsitedir() for backward compatibility, and then the question is whether we keep addsitedir(..., startup_state=None). We could probably get rid of that and just use the former everywhere internally to site.py.

@brettcannon brettcannon self-requested a review June 1, 2026 20:12
The comment by @encukou that started this change:

```
I still see two red flags here though: an argument that doesn't combine with
other arguments, and (another instance of) changing the return type based on
an argument.

Did you consider adding a StartupState.addsitedir(sitedir) method, instead of
the startup_state argument?
```

As it turns out, this is an even cleaner design.  By moving the bulk of the
previous module global functions into `StartupState` methods, we can get rid
of all the awkward `startup_state` keyword-only arguments which conflict
with `known_path` (Petr's first point).  We can also get rid of the
return value dichotomy (Petr's second point) because now we can preserve
exactly the Python 3.14 API in the module global functions, and implement
the better APIs in the class methods.  We also generally don't have to
pass around `process_known_sitedirs`.

Now the following module global functions are essentially shims around
class methods:

* site.addsitedir() -> StartupState.addsitedir()
* site.addusersitepackages() -> StartupState.addusersitepackages()
* site.addsitepackages() -> StartupState.addsitepackages()
Comment thread Lib/site.py Outdated
Comment thread Lib/site.py Outdated
Comment thread Lib/site.py Outdated
@warsaw
Copy link
Copy Markdown
Member Author

warsaw commented Jun 1, 2026

I didn't, but that actually might be pretty cool.

In fact, it was pretty cool. I really like where that headed so now StartupState has public methods that the module global functions are effectively shims for.

@warsaw warsaw merged commit 27ebd9a into python:main Jun 2, 2026
52 checks passed
@miss-islington-app
Copy link
Copy Markdown

Thanks @warsaw for the PR 🌮🎉.. I'm working now to backport this PR to: 3.15.
🐍🍒⛏🤖

@warsaw warsaw deleted the gh150228 branch June 2, 2026 01:43
@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented Jun 2, 2026

GH-150748 is a backport of this pull request to the 3.15 branch.

@bedevere-app bedevere-app Bot removed the needs backport to 3.15 pre-release feature fixes, bugs and security fixes label Jun 2, 2026
warsaw added a commit that referenced this pull request Jun 2, 2026
… (#150748)

gh-150228: Improve the PEP 829 batch processing APIs (GH-150542)

* gh-150228: Improve the PEP 829 batch processing APIs

As previously discussed with @ncoghlan and approved for 3.15b2 by @hugovk,
this implements the batch processing APIs for addsitedir() and friends.  We
remove the `defer_processing_start_files` flag which required some implicit
module global state, and promote StartupState to the public documented API.

This also moves the bulk of the module global functions into methods of the
`StartupState` class, so it removes the awkward APIs in 3.15b1.  Now, instances
of this class are an accumulator for startup state, using `StartupState.process()`
to process them.  Callers can now batch up startup state themselves by using
the methods on this class.  The module global functions are shims for this
which preserve the legacy APIs and semantics using the new state class.

This PR also fixes the interleaving regression identified by @ncoghlan in the
same issue.  Now, .pth file sys.path extensions are added to sys.path after
the sitedir that the .pth file is found in, restoring the legacy behavior.

Along the way, I've made a lot of improvements to function docstrings,
site.rst documentation, and comments in the code explaining what's going on.

* Add a note that if known_paths is provided to StartupState.__init__(), it
  will get mutated in place.
* Improve some conditional flows.
* Improve some comments.
* Improve the what's new entry.

* Make test_impl_exec_imports_suppressed_by_matching_start() more robust

Based on PR comment, we need to read both the .pth and .start files, and prove
that the .pth file's import line (which passes a bigger increment) is not
called, but the .start file's entry point (which uses the default increment)
is called.

* As per review, move some methods to the private API

_read_pth_file() and _read_start_file() are not intended to be part of the
public API surface outside of the site module, so even though they are used by
methods outside of the StartupState class, make them privately named.

* Resolve several review feedbacks

* Move a `versionadded`
* Better list comprehension formatting (use the output from
  `ruff format --line-length 78`)

* Add docs for site.makepath() and point the case-normalization requirement to
  this utility function.
* Note that StartupState.process() is not idempotent.

* Address another feedback comment

This time, we get rid of the legacy implementation `reset` local, which was
always difficult to understand, and just implement a return value based on the
processing mode selected.

* Changes based on gh-150228 review

The comment by @encukou that started this change:

```
I still see two red flags here though: an argument that doesn't combine with
other arguments, and (another instance of) changing the return type based on
an argument.

Did you consider adding a StartupState.addsitedir(sitedir) method, instead of
the startup_state argument?
```

As it turns out, this is an even cleaner design.  By moving the bulk of the
previous module global functions into `StartupState` methods, we can get rid
of all the awkward `startup_state` keyword-only arguments which conflict
with `known_path` (Petr's first point).  We can also get rid of the
return value dichotomy (Petr's second point) because now we can preserve
exactly the Python 3.14 API in the module global functions, and implement
the better APIs in the class methods.  We also generally don't have to
pass around `process_known_sitedirs`.

Now the following module global functions are essentially shims around
class methods:

* site.addsitedir() -> StartupState.addsitedir()
* site.addusersitepackages() -> StartupState.addusersitepackages()
* site.addsitepackages() -> StartupState.addsitepackages()
* Additional minor changes
* Remove a now unused parameter

(cherry picked from commit 27ebd9a)

Co-authored-by: Barry Warsaw <barry@python.org>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3.15 pre-release feature fixes, bugs and security fixes 3.16 new features, bugs and security fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants